SUMMARY: This project aims to construct a predictive model using a TensorFlow convolutional neural network (CNN) and document the end-to-end steps using a template. The Belgium Traffic Sign dataset is a multi-class classification situation where we attempt to predict one of several (more than two) possible outcomes.
INTRODUCTION: This dataset contains over 7,200 images of 62 varieties of traffic signs used in Belgium. The researcher performed experiments on the dataset to create a CNN-based classification system.
ANALYSIS: The InceptionV3 model's performance achieved an accuracy score of 98.84% after 10 epochs using the training dataset. When we applied the model to the validation dataset, the model achieved an accuracy score of 93.73%.
CONCLUSION: In this iteration, the TensorFlow InceptionV3 CNN model appeared suitable for modeling this dataset.
Dataset ML Model: Multi-Class classification with numerical features
Dataset Used: Belgium_Traffic_Sign_image_data_62_class_data
Dataset Reference: https://www.kaggle.com/datasets/abhi8923shriv/belgium-ts
One source of potential performance benchmarks: https://www.kaggle.com/datasets/abhi8923shriv/belgium-ts/code
# Retrieve CPU information from the system
ncpu = !nproc
print("The number of available CPUs is:", ncpu[0])
The number of available CPUs is: 2
# Retrieve memory configuration information
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))
Your runtime has 13.6 gigabytes of available RAM
# Retrieve GPU configuration information
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
print(gpu_info)
Mon Jun 13 01:13:17 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:04.0 Off | 0 |
| N/A 33C P0 22W / 300W | 0MiB / 16160MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# Set the random seed number for reproducible results
RNG_SEED = 888
import random
random.seed(RNG_SEED)
import numpy as np
np.random.seed(RNG_SEED)
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import sys
import math
# import boto3
import zipfile
from datetime import datetime
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import tensorflow as tf
tf.random.set_seed(RNG_SEED)
from tensorflow import keras
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Begin the timer for the script processing
START_TIME_SCRIPT = datetime.now()
# Set up the number of CPU cores available for multi-thread processing
N_JOBS = 1
# Set up the flag to stop sending progress emails (setting to True will send status emails!)
NOTIFY_STATUS = False
# Set the percentage sizes for splitting the dataset
TEST_SET_RATIO = 0.1
VAL_SET_RATIO = 0.1
# Set the number of folds for cross validation
N_FOLDS = 5
N_ITERATIONS = 1
# Set various default modeling parameters
DEFAULT_LOSS = 'categorical_crossentropy'
DEFAULT_METRICS = ['accuracy']
DEFAULT_OPTIMIZER = tf.keras.optimizers.Adam(learning_rate=0.00001)
CLASSIFIER_ACTIVATION = 'softmax'
MAX_EPOCHS = 10
BATCH_SIZE = 32
NUM_CLASSES = 62
# CLASS_LABELS = []
# CLASS_NAMES = []
# RAW_IMAGE_SIZE = (250, 250)
TARGET_IMAGE_SIZE = (299, 299)
INPUT_IMAGE_SHAPE = (TARGET_IMAGE_SIZE[0], TARGET_IMAGE_SIZE[1], 3)
# Define the labels to use for graphing the data
TRAIN_METRIC = "accuracy"
VALIDATION_METRIC = "val_accuracy"
TRAIN_LOSS = "loss"
VALIDATION_LOSS = "val_loss"
# Define the directory locations and file names
STAGING_DIR = 'staging/'
TRAIN_DIR = 'staging/BelgiumTSC_Training/Training/'
VALID_DIR = 'staging/BelgiumTSC_Testing/Testing/'
TEST_DIR = ''
TRAIN_DATASET = 'belgium-traffic-sign.zip'
# VALID_DATASET = ''
# TEST_DATASET = ''
# TRAIN_LABELS = ''
# VALID_LABELS = ''
# TEST_LABELS = ''
# OUTPUT_DIR = 'staging/'
# SAMPLE_SUBMISSION_CSV = 'sample_submission.csv'
# FINAL_SUBMISSION_CSV = 'submission.csv'
# Check the number of GPUs accessible through TensorFlow
print('Num GPUs Available:', len(tf.config.list_physical_devices('GPU')))
# Print out the TensorFlow version for confirmation
print('TensorFlow version:', tf.__version__)
Num GPUs Available: 1 TensorFlow version: 2.8.2
# Set up the email notification function
def status_notify(msg_text):
access_key = os.environ.get('SNS_ACCESS_KEY')
secret_key = os.environ.get('SNS_SECRET_KEY')
aws_region = os.environ.get('SNS_AWS_REGION')
topic_arn = os.environ.get('SNS_TOPIC_ARN')
if (access_key is None) or (secret_key is None) or (aws_region is None):
sys.exit("Incomplete notification setup info. Script Processing Aborted!!!")
sns = boto3.client('sns', aws_access_key_id=access_key, aws_secret_access_key=secret_key, region_name=aws_region)
response = sns.publish(TopicArn=topic_arn, Message=msg_text)
if response['ResponseMetadata']['HTTPStatusCode'] != 200 :
print('Status notification not OK with HTTP status code:', response['ResponseMetadata']['HTTPStatusCode'])
if NOTIFY_STATUS: status_notify('(TensorFlow Multi-Class) Task 1 - Prepare Environment completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))
if NOTIFY_STATUS: status_notify('(TensorFlow Multi-Class) Task 2 - Load and Prepare Images has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))
# Clean up the old files and download directories before receiving new ones
!rm -rf staging/
# !rm archive.zip
!mkdir staging/
if not os.path.exists(TRAIN_DATASET):
!wget https://dainesanalytics.com/datasets/kaggle-abhi8923shriv-belgium-traffic-sign/belgium-traffic-sign.zip
--2022-06-13 01:13:24-- https://dainesanalytics.com/datasets/kaggle-abhi8923shriv-belgium-traffic-sign/belgium-traffic-sign.zip Resolving dainesanalytics.com (dainesanalytics.com)... 65.9.85.3, 65.9.85.28, 65.9.85.71, ... Connecting to dainesanalytics.com (dainesanalytics.com)|65.9.85.3|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 452884286 (432M) [application/zip] Saving to: ‘belgium-traffic-sign.zip’ belgium-traffic-sig 100%[===================>] 431.90M 26.2MB/s in 20s 2022-06-13 01:13:45 (21.6 MB/s) - ‘belgium-traffic-sign.zip’ saved [452884286/452884286]
zip_ref = zipfile.ZipFile(TRAIN_DATASET, 'r')
zip_ref.extractall(STAGING_DIR)
zip_ref.close()
# Delete all the annotation files embedded in the image folders
!rm staging/BelgiumTSC_Training/Training/*.txt
!find staging/BelgiumTSC_Training/ -name "*csv" | xargs rm
!rm staging/BelgiumTSC_Testing/Testing/*.txt
!find staging/BelgiumTSC_Testing/ -name "*csv" | xargs rm
CLASS_LABELS = os.listdir(TRAIN_DIR)
print(CLASS_LABELS)
['00017', '00025', '00002', '00056', '00014', '00013', '00038', '00004', '00020', '00031', '00045', '00028', '00057', '00015', '00046', '00024', '00018', '00034', '00009', '00043', '00005', '00019', '00061', '00049', '00012', '00010', '00059', '00052', '00058', '00029', '00055', '00023', '00053', '00027', '00022', '00011', '00021', '00048', '00033', '00037', '00003', '00040', '00030', '00047', '00026', '00039', '00036', '00044', '00050', '00054', '00060', '00042', '00006', '00051', '00000', '00016', '00035', '00008', '00007', '00001', '00032', '00041']
# Brief listing of training image files for each class
for c_label in CLASS_LABELS:
training_class_dir = os.path.join(TRAIN_DIR, c_label)
training_class_files = os.listdir(training_class_dir)
print('Number of training images for', c_label, ':', len(os.listdir(training_class_dir)))
print('Training samples for', c_label, ':', training_class_files[:5],'\n')
Number of training images for 00017 : 79 Training samples for 00017 : ['00253_00001.ppm', '00845_00001.ppm', '01199_00001.ppm', '00910_00002.ppm', '00156_00001.ppm'] Number of training images for 00025 : 42 Training samples for 00025 : ['01151_00002.ppm', '01166_00002.ppm', '00696_00001.ppm', '00701_00001.ppm', '01184_00000.ppm'] Number of training images for 00002 : 13 Training samples for 00002 : ['01722_00002.ppm', '01503_00001.ppm', '01320_00000.ppm', '01722_00000.ppm', '01515_00001.ppm'] Number of training images for 00056 : 95 Training samples for 00056 : ['00296_00002.ppm', '01199_00001.ppm', '00784_00001.ppm', '00256_00001.ppm', '00784_00000.ppm'] Number of training images for 00014 : 43 Training samples for 00014 : ['00448_00001.ppm', '00408_00001.ppm', '00899_00000.ppm', '01925_00000.ppm', '01544_00000.ppm'] Number of training images for 00013 : 90 Training samples for 00013 : ['00866_00002.ppm', '01704_00002.ppm', '00869_00001.ppm', '00235_00000.ppm', '00884_00000.ppm'] Number of training images for 00038 : 285 Training samples for 00038 : ['00004_00002.ppm', '00796_00001.ppm', '00192_00002.ppm', '00686_00000.ppm', '01382_00001.ppm'] Number of training images for 00004 : 15 Training samples for 00004 : ['00145_00001.ppm', '00623_00000.ppm', '00214_00002.ppm', '01312_00001.ppm', '00623_00001.ppm'] Number of training images for 00020 : 42 Training samples for 00020 : ['01968_00002.ppm', '01968_00000.ppm', '01755_00002.ppm', '00332_00002.ppm', '01859_00002.ppm'] Number of training images for 00031 : 63 Training samples for 00031 : ['01909_00002.ppm', '01957_00002.ppm', '00272_00000.ppm', '01957_00001.ppm', '01716_00002.ppm'] Number of training images for 00045 : 74 Training samples for 00045 : ['00035_00001.ppm', '00833_00001.ppm', '00451_00001.ppm', '01637_00000.ppm', '01392_00002.ppm'] Number of training images for 00028 : 125 Training samples for 00028 : ['01896_00002.ppm', '00662_00001.ppm', '00155_00002.ppm', '01897_00001.ppm', '00040_00001.ppm'] Number of training images for 00057 : 78 Training samples for 00057 : ['01271_00002.ppm', '00472_00002.ppm', '00784_00001.ppm', '00784_00000.ppm', '00339_00000.ppm'] Number of training images for 00015 : 9 Training samples for 00015 : ['00675_00001.ppm', '00675_00000.ppm', '00709_00001.ppm', '00709_00000.ppm', '00675_00002.ppm'] Number of training images for 00046 : 44 Training samples for 00046 : ['01021_00002.ppm', '01628_00002.ppm', '00090_00001.ppm', '01116_00002.ppm', '00814_00000.ppm'] Number of training images for 00024 : 48 Training samples for 00024 : ['01845_00001.ppm', '01952_00000.ppm', '00355_00001.ppm', '01870_00002.ppm', '01280_00001.ppm'] Number of training images for 00018 : 81 Training samples for 00018 : ['00008_00000.ppm', '00365_00001.ppm', '00009_00000.ppm', '00429_00000.ppm', '01087_00002.ppm'] Number of training images for 00034 : 46 Training samples for 00034 : ['01418_00002.ppm', '01240_00001.ppm', '01260_00002.ppm', '01241_00001.ppm', '01646_00001.ppm'] Number of training images for 00009 : 18 Training samples for 00009 : ['00410_00002.ppm', '00141_00000.ppm', '00285_00001.ppm', '00285_00002.ppm', '00141_00001.ppm'] Number of training images for 00043 : 30 Training samples for 00043 : ['01589_00001.ppm', '00073_00000.ppm', '01735_00001.ppm', '00873_00001.ppm', '00878_00002.ppm'] Number of training images for 00005 : 11 Training samples for 00005 : ['00261_00000.ppm', '00261_00001.ppm', '00575_00000.ppm', '00390_00001.ppm', '00575_00002.ppm'] Number of training images for 00019 : 231 Training samples for 00019 : ['00923_00000.ppm', '00126_00002.ppm', '01344_00001.ppm', '00983_00002.ppm', '01355_00002.ppm'] Number of training images for 00061 : 282 Training samples for 00061 : ['01909_00002.ppm', '00192_00002.ppm', '01957_00002.ppm', '00951_00002.ppm', '01269_00001.ppm'] Number of training images for 00049 : 12 Training samples for 00049 : ['00311_00000.ppm', '00310_00000.ppm', '00720_00001.ppm', '00720_00000.ppm', '00258_00002.ppm'] Number of training images for 00012 : 18 Training samples for 00012 : ['01565_00002.ppm', '01565_00001.ppm', '01473_00002.ppm', '01473_00000.ppm', '01609_00001.ppm'] Number of training images for 00010 : 21 Training samples for 00010 : ['01903_00000.ppm', '01903_00001.ppm', '00289_00002.ppm', '01903_00002.ppm', '00263_00002.ppm'] Number of training images for 00059 : 42 Training samples for 00059 : ['00948_00002.ppm', '01586_00001.ppm', '01481_00001.ppm', '00350_00001.ppm', '01583_00000.ppm'] Number of training images for 00052 : 27 Training samples for 00052 : ['01131_00001.ppm', '00476_00002.ppm', '01128_00001.ppm', '00901_00002.ppm', '00476_00001.ppm'] Number of training images for 00058 : 15 Training samples for 00058 : ['01444_00000.ppm', '01444_00001.ppm', '00107_00001.ppm', '00776_00001.ppm', '00107_00000.ppm'] Number of training images for 00029 : 33 Training samples for 00029 : ['00376_00002.ppm', '01541_00002.ppm', '00413_00001.ppm', '00909_00002.ppm', '00021_00001.ppm'] Number of training images for 00055 : 12 Training samples for 00055 : ['00191_00001.ppm', '00276_00000.ppm', '01905_00001.ppm', '00191_00002.ppm', '01905_00002.ppm'] Number of training images for 00023 : 15 Training samples for 00023 : ['00535_00001.ppm', '00465_00001.ppm', '00535_00002.ppm', '00630_00000.ppm', '00785_00002.ppm'] Number of training images for 00053 : 199 Training samples for 00053 : ['01658_00001.ppm', '01554_00000.ppm', '01554_00001.ppm', '01052_00000.ppm', '01793_00002.ppm'] Number of training images for 00027 : 18 Training samples for 00027 : ['01597_00000.ppm', '01544_00000.ppm', '01544_00002.ppm', '01667_00002.ppm', '01597_00002.ppm'] Number of training images for 00022 : 375 Training samples for 00022 : ['01655_00002.ppm', '00840_00001.ppm', '00936_00001.ppm', '01055_00002.ppm', '01155_00000.ppm'] Number of training images for 00011 : 7 Training samples for 00011 : ['01507_00002.ppm', '00271_00003.ppm', '00271_00000.ppm', '01507_00001.ppm', '00271_00001.ppm'] Number of training images for 00021 : 43 Training samples for 00021 : ['01832_00000.ppm', '00684_00000.ppm', '01868_00001.ppm', '01258_00001.ppm', '00554_00002.ppm'] Number of training images for 00048 : 11 Training samples for 00048 : ['00128_00000.ppm', '00128_00001.ppm', '00091_00000.ppm', '00091_00002.ppm', '01923_00001.ppm'] Number of training images for 00033 : 12 Training samples for 00033 : ['00339_00000.ppm', '00343_00002.ppm', '00339_00002.ppm', '00343_00001.ppm', '00340_00001.ppm'] Number of training images for 00037 : 98 Training samples for 00037 : ['01344_00001.ppm', '01355_00002.ppm', '00975_00002.ppm', '00163_00001.ppm', '01346_00002.ppm'] Number of training images for 00003 : 15 Training samples for 00003 : ['00211_00000.ppm', '00211_00001.ppm', '00391_00000.ppm', '01291_00000.ppm', '00207_00001.ppm'] Number of training images for 00040 : 242 Training samples for 00040 : ['01791_00002.ppm', '01177_00001.ppm', '01411_00000.ppm', '00448_00001.ppm', '00829_00000.ppm'] Number of training images for 00030 : 37 Training samples for 00030 : ['00686_00000.ppm', '00405_00001.ppm', '01293_00000.ppm', '00625_00000.ppm', '00328_00000.ppm'] Number of training images for 00047 : 147 Training samples for 00047 : ['00487_00000.ppm', '01410_00002.ppm', '01596_00000.ppm', '00498_00001.ppm', '01694_00002.ppm'] Number of training images for 00026 : 6 Training samples for 00026 : ['00246_00002.ppm', '00173_00000.ppm', '00246_00000.ppm', '00173_00001.ppm', '00173_00002.ppm'] Number of training images for 00039 : 196 Training samples for 00039 : ['00923_00000.ppm', '01909_00002.ppm', '01838_00001.ppm', '00944_00001.ppm', '00626_00001.ppm'] Number of training images for 00036 : 18 Training samples for 00036 : ['01687_00000.ppm', '00564_00000.ppm', '01285_00000.ppm', '00564_00002.ppm', '01457_00001.ppm'] Number of training images for 00044 : 48 Training samples for 00044 : ['01966_00000.ppm', '01597_00000.ppm', '00228_00002.ppm', '00228_00001.ppm', '00289_00002.ppm'] Number of training images for 00050 : 15 Training samples for 00050 : ['00320_00001.ppm', '00824_00000.ppm', '00581_00001.ppm', '00322_00000.ppm', '00584_00002.ppm'] Number of training images for 00054 : 118 Training samples for 00054 : ['01757_00001.ppm', '01415_00002.ppm', '00489_00001.ppm', '01937_00001.ppm', '00162_00002.ppm'] Number of training images for 00060 : 9 Training samples for 00060 : ['01892_00000.ppm', '01877_00002.ppm', '00209_00002.ppm', '01892_00001.ppm', '00209_00000.ppm'] Number of training images for 00042 : 35 Training samples for 00042 : ['00076_00000.ppm', '00076_00002.ppm', '01582_00002.ppm', '01582_00000.ppm', '01166_00002.ppm'] Number of training images for 00006 : 18 Training samples for 00006 : ['00220_00000.ppm', '00215_00001.ppm', '00215_00000.ppm', '00326_00001.ppm', '00327_00002.ppm'] Number of training images for 00051 : 27 Training samples for 00051 : ['01127_00002.ppm', '01878_00000.ppm', '01761_00000.ppm', '00902_00002.ppm', '00470_00000.ppm'] Number of training images for 00000 : 15 Training samples for 00000 : ['01797_00000.ppm', '01797_00002.ppm', '01798_00002.ppm', '01797_00001.ppm', '01799_00002.ppm'] Number of training images for 00016 : 9 Training samples for 00016 : ['00444_00001.ppm', '00332_00002.ppm', '00444_00002.ppm', '00444_00000.ppm', '00332_00001.ppm'] Number of training images for 00035 : 60 Training samples for 00035 : ['00378_00002.ppm', '00345_00001.ppm', '00588_00000.ppm', '01711_00001.ppm', '00345_00002.ppm'] Number of training images for 00008 : 27 Training samples for 00008 : ['01578_00001.ppm', '01016_00002.ppm', '00046_00001.ppm', '01520_00001.ppm', '01578_00002.ppm'] Number of training images for 00007 : 157 Training samples for 00007 : ['01801_00000.ppm', '01433_00000.ppm', '01159_00002.ppm', '00748_00001.ppm', '00618_00002.ppm'] Number of training images for 00001 : 110 Training samples for 00001 : ['00801_00002.ppm', '01103_00002.ppm', '00468_00002.ppm', '01651_00000.ppm', '00806_00000.ppm'] Number of training images for 00032 : 316 Training samples for 00032 : ['01848_00000.ppm', '00192_00002.ppm', '01818_00002.ppm', '01849_00000.ppm', '01828_00000.ppm'] Number of training images for 00041 : 148 Training samples for 00041 : ['01778_00000.ppm', '01078_00000.ppm', '01371_00002.ppm', '01449_00000.ppm', '00889_00001.ppm']
# Plot some training images from the dataset
nrows = len(CLASS_LABELS)
ncols = 4
training_examples = []
example_labels = []
fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 3)
for c_label in CLASS_LABELS:
training_class_dir = os.path.join(TRAIN_DIR, c_label)
training_class_files = os.listdir(training_class_dir)
for j in range(ncols):
training_examples.append(training_class_dir + '/' + training_class_files[j])
example_labels.append(c_label)
# print(training_examples)
# print(example_labels)
for i, img_path in enumerate(training_examples):
# Set up subplot; subplot indices start at 1
sp = plt.subplot(nrows, ncols, i+1)
sp.text(0, 0, example_labels[i])
# sp.axis('Off')
img = mpimg.imread(img_path)
plt.imshow(img)
plt.show()